{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Azure AI Search vectorization using sentence-transformers\n",
    "\n",
    "This code demonstrates how to use Azure AI Search with a Hugging Face embedding model, [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), and the Azure AI Search Documents Python SDK.\n",
    "\n",
    "It uses `azd` and a bicep template for all deployment steps so that you can focus on queries.\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "+ Follow the instructions in the [readme](./readme.md) to deploy all Azure resources, and to create and load the search index.\n",
    "\n",
    "+ Check your search service to make sure the index exists. If you don't see an index, revisit the readme and run the `setup_search_service` script.\n",
    "\n",
    "+ Don't add an `.env` file to this folder. Environment variables are read from the `azd` deployment.\n",
    "\n",
    "+ Install the packages necessary for running the queries in this notebook. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install azure-search-documents==11.6.0b3 --quiet\n",
    "! pip install python-dotenv azure-identity --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Load all environment variables from the azd deployment\n",
    "import subprocess\n",
    "from io import StringIO\n",
    "from dotenv import load_dotenv\n",
    "result = subprocess.run([\"azd\", \"env\", \"get-values\"], stdout=subprocess.PIPE)\n",
    "load_dotenv(stream=StringIO(result.stdout.decode(\"utf-8\")))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "search_url = f\"https://{os.environ['AZURE_SEARCH_SERVICE']}.search.windows.net\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a vector similarity search\n",
    "\n",
    "This example shows a pure vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azure.search.documents import SearchClient\n",
    "from azure.search.documents.models import VectorizableTextQuery\n",
    "from azure.identity import DefaultAzureCredential\n",
    "# Pure Vector Search\n",
    "query = \"What's a performance review?\"  \n",
    "  \n",
    "search_client = SearchClient(search_url, \"custom-embedding-index\", credential=DefaultAzureCredential())\n",
    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"vector\", exhaustive=True)\n",
    "# Use the below query to pass in the raw vector query instead of the query vectorization\n",
    "# vector_query = RawVectorQuery(vector=generate_embeddings(query), k_nearest_neighbors=3, fields=\"vector\")\n",
    "  \n",
    "results = search_client.search(  \n",
    "    search_text=None,  \n",
    "    vector_queries= [vector_query],\n",
    "    select=[\"parent_id\", \"chunk_id\", \"chunk\"],\n",
    "    top=1\n",
    ")  \n",
    "  \n",
    "for result in results:  \n",
    "    print(f\"parent_id: {result['parent_id']}\")  \n",
    "    print(f\"Score: {result['@search.score']}\")  \n",
    "    print(f\"Content: {result['chunk']}\")  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a hybrid search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Hybrid Search\n",
    "query = \"What's a performance review?\"  \n",
    "  \n",
    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"vector\", exhaustive=True)\n",
    "  \n",
    "results = search_client.search(  \n",
    "    search_text=query,  \n",
    "    vector_queries= [vector_query],\n",
    "    select=[\"parent_id\", \"chunk_id\", \"chunk\"],\n",
    "    top=1\n",
    ")  \n",
    "  \n",
    "for result in results:  \n",
    "    print(f\"parent_id: {result['parent_id']}\")  \n",
    "    print(f\"chunk_id: {result['chunk_id']}\")  \n",
    "    print(f\"Score: {result['@search.score']}\")  \n",
    "    print(f\"Content: {result['chunk']}\")  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a hybrid search + Semantic reranking"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azure.search.documents.models import QueryType, QueryCaptionType, QueryAnswerType\n",
    "\n",
    "# Semantic Hybrid Search\n",
    "query = \"What's a performance review?\"\n",
    "\n",
    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"vector\", exhaustive=True)\n",
    "\n",
    "results = search_client.search(  \n",
    "    search_text=query,\n",
    "    vector_queries=[vector_query],\n",
    "    select=[\"parent_id\", \"chunk_id\", \"chunk\"],\n",
    "    query_type=QueryType.SEMANTIC,  semantic_configuration_name='my-semantic-config', query_caption=QueryCaptionType.EXTRACTIVE, query_answer=QueryAnswerType.EXTRACTIVE,\n",
    "    top=2\n",
    ")\n",
    "\n",
    "semantic_answers = results.get_answers()\n",
    "for answer in semantic_answers:\n",
    "    if answer.highlights:\n",
    "        print(f\"Semantic Answer: {answer.highlights}\")\n",
    "    else:\n",
    "        print(f\"Semantic Answer: {answer.text}\")\n",
    "    print(f\"Semantic Answer Score: {answer.score}\\n\")\n",
    "\n",
    "for result in results:\n",
    "    print(f\"parent_id: {result['parent_id']}\")  \n",
    "    print(f\"chunk_id: {result['chunk_id']}\")  \n",
    "    print(f\"Score: {result['@search.score']}\")  \n",
    "    print(f\"Content: {result['chunk']}\")  \n",
    "\n",
    "    captions = result[\"@search.captions\"]\n",
    "    if captions:\n",
    "        caption = captions[0]\n",
    "        if caption.highlights:\n",
    "            print(f\"Caption: {caption.highlights}\\n\")\n",
    "        else:\n",
    "            print(f\"Caption: {caption.text}\\n\")\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
